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Abstract. Assembling a gene from candidate exons is an important 
problem in computational biology. Among the most successful approaches 
to this problem is spliced alignment, proposed by Gelfand et al., which 
scores different candidate exon chains within a DNA sequence of length 
m by comparing them to a known related gene sequence of length n, 
m = 0{n). Gelfand et al. gave an algorithm for spliced alignment running 
in time O(n^). Kent et al. considered sparse spliced alignment, where the 
number of candidate exons is 0(n), and proposed an algorithm for this 
problem running in time 0(n^'^). We improve on this result, by proposing 
an algorithm for sparse spliced alignment running in time 0(n^-2^). Our 
approach is based on a new framework of quasi-local string comparison. 

1 Introduction 

Assembling a gene from candidate exons is an important problem in compu- 
tational biology. Several alternative approaches to this problem have been de- 
veloped over time. Among the most successful approaches is spliced alignment 
[B^ , which scores different candidate exon chains within a DNA sequence by com- 
paring them to a known related gene sequence. In this method, the two sequences 
are modelled respectively by strings a, b of lengths m, n. We usually assume that 
m = 0{n). A subset of substrings in string a are marked as candidate exons. 
The comparison between sequences is made by string alignment. Gelfand et al. 
[B] give an algorithm for spliced alignment running in time 0{n^). 

In general, the number of candidate exons k may be as high as O(n^). The 
method of sparse spliced alignment makes a realistic assumption that, prior to 
the assembly, the set of candidate exons undergoes some filtering, after which 
only a small fraction of candidate exons remains. Kent et al. [9] give an algorithm 
for sparse spliced alignment that, in the special case k = 0{n), runs in time 
0(n^-^). For asymptotically higher values of k, the algorithm provides a smooth 
transition in running time to the dense case k — 0{n^), where its running time 
is asymptotically equal to the general spliced alignment algorithm of [6]. 

In this paper, we improve on the results of [9], by proposing an algorithm 
for sparse spliced alignment that, in the special case k = 0{n), runs in time 
0{n^'^^). Like its predecessor, the algorithm also provides a smooth transition 
in running time to the dense case. Our approach is based on a new framework of 
quasi-local string comparison, that unifies the semi-local string comparison from 
|12] and fully-local string comparison. 



This paper is a sequel to paper [12^ ; we include most of its relevant material 
here for completeness. However, we omit some definitions and proofs due to 
space constraints, referring the reader to [T2] for the details. 

2 Semi-local longest common subsequences 

We consider strings of characters from a fixed finite alphabet, denoting string 
concatenation by juxtaposition. Given a string, we distinguish between its con- 
tiguous substrings, and not necessarily contiguous subsequences. Special cases of 
a substring are a prefix and a suffix of a string. Given a string a, we denote 
by a^*^^ and 0(j;) respectively its prefix and suffix of length k. For two strings 
a = a\a2 ■ ■ ■ am and b — P1P2 ■ ■ ■ Pn of lengths m, n respectively, the longest 
common subsequence (LCS) problem consists in computing the length of the 
longest string that is a subsequence both of a and b. We will call this length the 
LCS score of the strings. 

We define a generalisation of the LCS problem, which we introduced in [T^] 
as the all semi-local LCS problem. It consists in computing the LCS scores on 
substrings of a and b as follows: 

• the all string- sub string LCS problem: a against every substring of b; 

• the all prefix-suffix LCS problem: every prefix of a against every suffix of 6; 

• symmetrically, the all substring-string LCS problem and the all suffix-prefix 
LCS problem, defined as above but with the roles of a and b exchanged. 

It turns out that by considering this combination of problems rather than each 
problem separately, the algorithms can be greatly simplified. 

A traditional distinction, especially in computational biology, is between 
global (full string against full string) and local (all substrings against all sub- 
strings) comparison. Our problem lies in between, hence the term "semi-local". 
Many string comparison algorithms output either a single optimal comparison 
score across all local comparisons, or a number of local comparison scores that 
are "sufficiently close" to the globally optimal. In contrast with this approach, 
we require to output all the locally optimal comparison scores. 

In addition to standard integer indices . . . , —2, —1, 0, 1,2,.. ., we use odd half- 
integer indices . . . , — |, — |,— 1,1,|,|,.... For two numbers i, j, we write i ^ j 
if j ~ i £ {0, 1}, and i < j if j — i — 1. We denote 

[i : j] ^{i,i-{-l,...,j - 

{i:j) = {i+^,i+l,...,j-l,j-^} 

To denote infinite intervals of integers and odd half-integers, we will use —00 for 
i and +cxd for j where appropriate. For both interval types [i : j] and {i : j), we 
call the difference j — i interval length. 

We will make extensive use of finite and infinite matrices, with integer ele- 
ments and integer or odd half-integer indices. A permutation matrix is a (0,1)- 
matrix containing exactly one nonzero in every row and every column. An iden- 
tity matrix is a permutation matrix /, such that — 1 li i — j, and 
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Fig. 1. An alignment dag and its implicit highest-score matrix 

— otherwise. Each of these definitions applies both to finite and in- 
finite matrices. 

A finite permutation matrix can be represented by its nonzeros' index set. 
When we deal with an infinite matrix, it will typically have a finite non-trivial 
core, and will be trivial (e.g. equal to an infinite identity matrix) outside of 
this core. An infinite permutation matrix with finite non-trivial core can be 
represented by its core nonzeros' index set. 

Let D be an arbitrary numerical matrix with indices ranging over (0 : n) . Its 
distribution matrix, with indices ranging over [0 : n], is defined by 

d{io,jo) = ^D{i,j) i e {io : n),j e {0 : jo) 

for all io , jo G [0 : n] . 

When matrix d is a distribution matrix of D, matrix D is called the density 
matrix of d. The definitions of distribution and density matrices extend natu- 
rally to infinite matrices. We will only deal with distribution matrices where all 
elements are defined and finite. 

We will use the term permutation- distribution matrix as an abbreviation of 
"distribution matrix of a permutation matrix" . 

We refer the reader to |12j for the definition of alignment dag. In the context of 
the alignment dag, a substring a^ai+i . . .aj corresponds to the interval [i — 1 : j]; 
we will make substantial use of this correspondence in Section |4j 

We also refer the reader to |12j for the definitions of (extended) highest-score 
matrix, and of its implicit representation. Figure [T]shows an alignment dag of two 
strings, along with the nonzeros of its implicit highest-score matrix. In particular, 
a nonzero {i,j), where i,j G (0 : n), is represented by a "seaweed" curv^ 
originating between the nodes Vq ^^i and and terminating between the 

^ For the purposes of this illustration, the specific layout of the curves between their 
endpoints is not important. However, notice that each pair of curves have at most 
one crossing, and that the same property is true for highest-scoring paths in the 
alignment dag. 



nodes and u„ The remaining curves, originating or terminating at 

the sides of the dag, correspond to nonzeros where either i ^ {0 : n) or 

j ^ (0 : n). For details, see [T^ . 

Essentially, an extended highest-score matrix represents in a unified form the 
solutions of the string-substring, substring-string, prefix-suffix and suffix-prefix 
LCS problems. In particular, row of this matrix contains the LCS scores of 
string a against every prefix of string b. When considering such an array of n -I- 1 
LCS scores on its own, we will call it highest-score vector for a against b. Every 
highest-score vector will be represented explicitly by an integer array of size 
n + I (as opposed to the implicit representation of the complete highest-score 
matrix, which allows one to store all the rows compactly in a data structure of 
size 0{m + n)). 

3 Fast highest-score matrix multipHcation 

Our algorithms are based on the framework for the all semi-local LCS problem 
developed in [T5], which refines the approach of 

A common pattern in the problems considered in this paper is partitioning 
the alignment dag into alignment subdags. Without loss of generality, consider 
a partitioning of an (M -I- m) x n alignment dag G into an M x n alignment dag 
Gi and an m x n alignment dag G2, where M > m. The dags Gi, G2 share a 
horizontal row of n nodes, which is simultaneously the bottom row of Gi and 
the top row of G2; the dags also share the corresponding n — 1 horizontal edges. 
We will say that dag G is the concatenation of dags Gi and G2. Let A, B, C 
denote the extended highest-score matrices defined respectively by dags Gi, G2, 
G. In every recursive call our goal is, given matrices A, B, to compute matrix G 
efficiently. We call this procedure highest-score matrix multiplication. 

The implicit representation of matrices A, B, C consists of respectively M-{-n, 
m + n, M -\- m -\- n non-trivial nonzeros. 

The results of this paper are based on the following results from [T^] ; see the 
original paper for proofs and discussion. 

Definition 1. Let n e N. Let A, B, C be arbitrary numerical matrices with 
indices ranging over [0 : n]. The (min, -|-)-product A Q B — C is defined by 
G{i,k) = nimj(^A{i,j) -\- B{j,k)'j, where i,j,k £ [0 : n]. 

Lemma 1 ( jl2p . Let Da, Dg, Dc be permutation matrices with indices rang- 
ing over (0 : n), and let dA, ds, dc be their respective distribution matrices. Let 
dAQ ds — dc- Given the set of nonzero elements' index pairs in each of Da, 
Db, the set of nonzero elements' index pairs in Dc can be computed in time 
0(n^'^) and memory 0{n). 

Lemma 2 (|jl2j). Let Da, Db, Dc be permutation matrices with indices rang- 
ing over (—00 : -I-cxd), such that 



DA{l,j)^I{i,j) 

DBU,k)=I{j,k) 



for i,j £ {-00 : 0) 
for j, k £ {n : +00) 
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Fig. 2. An illustration of Lemma [2] 



Let dA, ds, dc he their respective distribution matrices. Let dA Q ds — dc- We 
have 

DA{i,j) = Dc{i,j) for i e (-oo : +oo), j E {n : +oo) (1) 

DbU, k) = DcU, k) for j e (-oo : 0), /c G {-^ : (2) 

Given the set of all n remaining nonzero elements' index pairs in each of Da, 
Db, i-e. the set of all nonzero elements' index pairs {i, j) in Da and {j,k) in Db 
with i G (0 : +oo), j G (0 : n), k G {—oo : 0), the set of all n remaining nonzero 
elements' index pairs in Dc can be computed in time 0{n^'^) and memory 0{n). 

The lemma is illustrated by Figure |2] Three horizontal lines represent respec- 
tively the index ranges of i, j, k. The nonzeros in Da and Db are shown respec- 
tively by top-to-middle and middle-to-bottom "seaweed" curves. The nonzeros 
in Dc described by ([l]) , ([2| are shown by top-to-bottom thick "seaweed" curves. 
The remaining nonzeros in Dc are not shown; they are determined by appli- 
cation of Lemma [l] from nonzeros in Da and Db shown by top-to-middle and 
middle-to-bottom thin "seaweed" curves. 

Lemma [2] gives a method for multiplying infinite permutation-distribution 
matrices, in the special case where both multiplicands have semi-infinite core. 
We now consider the complementary special case, where one multiplicand's core 
is unbounded, and the other's is finite. 

Lemma 3. Let Da, Db, Dc be permutation matrices with indices ranging over 
{—oo : -|-oo), such that 

Db{], k) = /(j, k) for J, k G (-00 : 0) U (n : +^) 

Let dA, dB, dc be their respective distribution matrices. Let dA Q dB — dc. We 
have 

DA{i,j) = Dc{i,j) for i G (-oo : -|-oo), j G (-oo : 0) U (n : +oo> (3) 

Given the set of all n remaining nonzero elements' index pairs in each of Da, 
Db, i.e. the set of all nonzero elements' index pairs {i,j) in Da and (j, k) in Db 
with i G (— oo : -|-oo), j, fc G (0 : n), the set of all n remaining nonzero elements' 
index pairs in Dc can be computed in time 0(n^'^) and memory 0{n). 



n 




Fig. 3. An illustration of Lemma [3] 



Proof. By Lemma [T] see Appendix. □ 

The lemma is illustrated by Figure [3] using the same conventions as Figure [2] 

Lemma 4. Consider the concatenation of alignment dags as described above, 
with highest-score matrices A, B, C . Given the implicit representations of A, 
B, the implicit representation of C can be computed in time 0(M + m'^'^n) and 
memory 0{M + n). 

Proof. By Lemma |3] see Appendix. □ 

We will also need a separate efficient algorithm for obtaining highest-score 
vectors instead of full highest-score matrices. This algorithm, which we call 
highest-score matrix-vector multiplication, is complementary to the highest-score 
matrix multiplication algorithm of Lemma [T| An equivalent procedure is given 
(using different terminology and notation) in ^10i5|9i . based on techniques from 

m- 

Lemma 5 ( jl0|l5ll9] ). Let Da be a permutation matrix with indices ranging 
over (0 : n) , and let dA be its distribution matrix. Let x, y be numerical (column) 
vectors with indices ranging over (0 : n). Let dAQx = y. Given the set of nonzero 
elements' index pairs in Da, and the elements of x, the elements of y can be 
computed in time O(nlogn) and memory 0{n). 



4 Quasi-local string comparison 

Consider an arbitrary set of substrings of string a. We call substrings in this set 
prescribed substrings, and denote their number by k. Our aim is to compare the 
LCS scores on substrings of a and b as follows: 

• the quasi-local LGS problem: every prescribed substring of a against every 
substring of b. 

This problem includes as special cases the semi-local string comparison from 
[T?| and fully-local string comparison, as well as length-constrained local align- 
ment from [2]. Note that the solution of the quasi- local LCS problem can be 



represented in space 0{kn) by giving the implicit highest-score matrix for each 
prescribed substring of a against h. An individual quasi-local LCS score query 
can be performed on this data structure in time 0(log^ n) (or even O( io'°kig„ ) 
with a higher multiplicative constant). 

In the rest of this section, we propose an efficient algorithm for the quasi- 
local LCS problem. For simplicity, we first consider the case k = 0{m). Intervals 
corresponding to prescribed substrings of a will be called prescribed intervals. 

Algorithm 1 (Quasi- local LCS). 

Input: strings a, 6 of length m, n, respectively; a set of fc = 0{in) endpoint 
index pairs for the prescribed substrings in a. 

Output: implicit highest-score matrix for every prescribed substring of a against 
full b. 

Description. For simplicity, we assume that m is a power of 4. We call an 
interval of the form [fc • 2^* : (A; -|- 1) • 2^*], fc, s S Z, as well as the corresponding 
substring of a, canonical. In particular, all individual characters of a are canonical 
substrings. Every substring of a can be decomposed into a concatenation of 
O(logm) canonical substrings. 

In the following, by processing an interval we mean computing the implicit 
highest-score matrix for the corresponding substring of a against b. 
First phase. Canonical intervals are processed in a balanced binary tree, in order 
of increasing length. Every interval of length 2" = 1 is canonical, and is processed 
by a simple scan of string b. Every canonical interval of length 2^^+^ is processed 
as a concatenation of two already processed half-sized canonical intervals of 
length 2". 

Second phase. We represent each prescribed interval [i, j] by an odd half-integer 
prescribed point {i,j) € (0 : m)^ Q. On the set of prescribed points, we build a 
data structure allowing efficient orthogonal range counting queries. A classical 
example of such a data structure is the range tree U . 

We then proceed by partitioning the square index pair range (0 : m) ^ recur- 
sively into regular half-sized square blocks. 

Consider an ft, x /i block {iq — h : io) x (jo : ja + h) . The computation is 
organised so that when a recursive call is made on this block, either we have 
io > jo, or the interval [ig : jo] is already processed. 

For the current block, we query the number of prescribed points it contains. If 
this number is zero, no further computation on the block or recursive partitioning 
is performed. Otherwise, we have j — i € {—h, 0, h, 2h, . . .}. lij — i = —h, then the 
intervals [io — h : jo], [io : jo + h] have length 0, and the interval [io — h : jo + h] 
is canonical. If j — i > 0, we process the intervals [io — h : jo], [io : jo + h], 
[io — h : jo -I- h] . Each of these intervals can be processed by Lemmaji] appending 
and/or prepending a canonical interval of length h to the already processed 

^ The overall algorithm structure is essentially equivalent to building a one- 
dimensional range tree [4] on the interval [0 : m] , and then performing on this range 
tree a batch query consisting of all the prescribed points. However, in contrast with 
standard range trees, the cost of processing nodes in our algorithm is not uniform. 



interval [iq : jo]- We then perform further partitioning of the block, and call the 
procedure recursively on each of the four subblocks. 

The base of the recursion is /i = 1. At this point, we process all 1 x 1 blocks 
containing a prescribed point, which is equivalent to processing the original 
prescribed intervals. The computation is completed. 
Cost analysis. 

First phase. The computation is dominated by the cost of the bottom level of 
the computation tree, equal to m/2 • 0(n) = 0{mn). 

Second phase. The recursion tree has maximum degree 4, height log m, and 
0{m) leaves corresponding to the prescribed points. 

Consider the top-to-middle levels of the recursion tree. In each level from 
the top down to the middle level, the maximum number of nodes increases by a 
factor of 4, and the maximum amount of computation work per node decreases 
by a factor of 2°'^. Hence, the maximum amount of work per level increases in 
geometric progression, and is dominated by the middle level ^"l'" . 

Consider the middle-to-bottom levels of the recursion tree. Since the tree 
has 0{m) leaves, each level contains at most 0{m) nodes. In each level from 
the middle down to the bottom level, the maximum amount of computation 
work per node still decreases by a factor of 2°'^. Hence, the maximum amount 
of work per level decreases in geometric progression, and is again dominated by 
the middle level 

Thus, the computational work in the whole recursion tree is dominated by the 
maximum amount of work in the middle level ^"l™ . This level has at most 0(rn) 

nodes, each requiring at most 0{mP'^n)/2'^ '^'~^ — O^mP '^^n) work. Therefore, 
the overall computation cost of the recursion is at most 0{m) ■ 0{'mP-^^n) = 
0(mi-25n). □ 

The same algorithm can be applied in the case of general fc, 1 < fc < (™). 
For 1 < A: < rn?^'^, the first phase dominates, so the overall computation cost is 
0{mn). For m?^^ < k < (™), the second phase dominates. The dominant level in 
the recursion tree will have fc nodes, each requiring at most 0{m'^-^n/k'^-^'^) work. 
Therefore, the overall computation cost will be at most k ■ 0{nrfi-^n/k^-^^) = 
0{mP'^k'^''^^n). In the fully-local case fc — (™), the cost is 0{m?n)] the same 
result can be obtained by m independent runs of algorithms from at the 

same asymptotic cost. 



5 Sparse spliced alignment 

We now consider the problem of sparse spliced alignment. We keep the nota- 
tion and terminology of the previous sections; in particular, candidate exons are 
represented by prescribed substrings of string a. We say that prescribed sub- 
string a' = Ui' . . . Uji precedes prescribed substring a" = a;" . . . a^", if j' < i" . 
A chain of substrings is a chain in the partial order of substring precedence. We 
identify every chain with the string obtained by concatenating all its constituent 
substrings in the order of precedence. 



Our sparse spliced alignment algorithm is based on the efficient method of 
quasi-local string comparison developed in the previous section. This improves 
the running time of the bottleneck procedure from . The algorithm also uses 
a generalisation of the standard network alignment method, equivalent to the 
one used by [H]- For simplicity, we describe our algorithm for the special case of 
unit-cost LCS score. 

Algorithm 2 (Sparse spliced alignment). 

Input: strings a, b of length m, n, respectively; a set of fc = 0(m) endpoint 
index pairs for the prescribed substrings in a. 

Output: the chain of prescribed substrings in a, giving the highest LCS score 
against string b. 

Description. The algorithm runs in two phases. 

First phase. By running Algorithm [l] we compute the implicit highest-score 
matrix for every prescribed substring of a against b. 

Second phase. We represent the problem by a dag (directed acyclic graph) on 
the set of nodes Ui , where i £ [0 : m] . For each prescribed substring at . . . aj, the 
dag contains the edge Ui-i — s- Uj. Overall, the dag contains k = 0{m) edges. 

The problem can now be solved by dynamic programming on the representing 
dag as follows. Let s[i, j] denote the highest LCS score for a chain of prescribed 
substrings in prefix string a^'-* against prefix string b'^^\ With each node Vi, we 
associate the integer vector s[i,-]. The nodes are processed in increasing order 
of their indices. For the node Uq, vector s[0, •] is initialised by all zeros. For 
a node Uj, we consider every edge Ui-i — s- Uj, and compute the highest-score 
matrix- vector product between vector s[i — 1, •] and the highest-score matrix 
corresponding to prescribed string . . . aj by the algorithm of Lemma |5] Vector 
s[j, •] is now obtained by taking the elementwise maximum between vector s[j — 
1, •] and all the above highest-score matrix- vector products. 

The solution score is given by the value s[m, n]. The solution chain of pre- 
scribed substrings can now be obtained by tracing the dynamic programming 
sequence backwards from node u„i to node uq. 
Cost analysis. 

First phase. Algorithm [l] runs in time 0{m}'^^n). 

Second phase. For each of the k = 0{m) edges in the representing dag, the 
algorithm of Lemma [s] runs in time 0(n log n). Therefore, the total cost of this 
phase is 0(rn) ■ 0{n\ogn) — 0{mn\ogn). 

The overall cost of the algorithm is dominated by the cost of the first phase, 
equal to 0{m} '^^n). □ 

In the case of general k, the analysis of the previous section can be applied to 
obtain a smooth transition between the sparse and dense versions of the problem. 

By a constant-factor blow-up of the alignment dag, our algorithms can be 
extended from the LCS score to the more general edit score, where the insertion, 
deletion and substitution costs are any constant rationals. 



6 Conclusions 



We have presented an improved algorithm for sparse spHced ahgnment, running 
in time 0(n^'^^), and providing a smooth transition in the running time to 
the dense case. A natural question is whether this running time can be further 
improved. 

Our algorithm is based on the previously developed framework of semi-local 
string comparison by implicit highest-score matrix multiplication. The method 
compares strings locally by the LCS score, or, more generally, by an edit score 
where the insertion, deletion and substitution costs are any constant rationals. It 
remains an open question whether this framework can be extended to arbitrary 
real costs, or to sequence alignment with non-linear gap penalties. 
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A Proof of Lemma |3] 



Proof (Lemma It is straightforward to check equality ([s]), by ([2]) and 
Definition [T| Informally, each nonzero of Dq appearing in ^ is obtained as a 
direct combination of a non-trivial nonzero of Da and a trivial nonzero of Db- All 
remaining nonzeros of Da and Db are non-trivial, and determine collectively the 
remaining nonzeros of Dc- However, this time the direct one-to-one relationship 
between nonzeros of Dq and pairs of nonzeros of Da and Db need not hold. 

Observe that all the nonzeros of Da appearing in ^ with j e (—00 : 0) 
are dominated by each of the remaining nonzeros of Da- Furthermore, none 
of the nonzeros of Da appearing in ^ with j £ {n : +00) can be dominated 
by any of the remaining nonzeros of Da- Hence, the nonzeros appearing in ([sjl 
cannot affect the computation of the remaining nonzeros of Dq- We can therefore 
simplify the problem by eliminating all half-integer indices i, j, k that correspond 
to nonzero index pairs and (j, fc) appearing in ([3|, and then renumbering 
the remaining indices i, so that their new range becomes (0 : n) (which is already 
the range of j, k after the elimination) . More precisely, we define permutation 
matrices D^, D'g, D'^, with indices ranging over (0 : n), as follows. Matrix 
is obtained from Da by selecting all rows i with a nonzero DA{i,j), J £ (0 : n), 
and then selecting all columns that contain a nonzero in at least one (in fact, 
exactly one) of the selected rows. Matrix D'b is obtained from Db hy selecting 
all rows j and columns k, where j, fc G (0 : n). Matrix is obtained from Dc 
by selecting all rows i with a nonzero Dc(i-,k), k £ (0 : n), and then selecting 
all columns that contain a nonzero in at least one (in fact, exactly one) of the 
selected rows. We define d^, d'g, d'^ accordingly. The index order is preserved 
by the above matrix transformation, so the dominance relation is not affected. 
Both the matrix transformation and its inverse can be performed in time and 
memory 0(n). 

It is easy to check that d'A Q dg — d'(j, iS dAQ dB — dc- Matrices D'a, D'b, 
D'q satisfy the conditions of Lemma [l] Therefore, given the set of nonzero index 
pairs of , D'^ , the set of nonzero index pairs of D'fj can be computed in time 
0(n^-^) and memory 0(n). □ 



B Proof of Lemma |4] 

Proof (Lemma By Lemma |3] all but n non-trivial nonzeros of Dc can 
be obtained in time and memory 0{M -f n). We now show how to obtain the 
remaining non-trivial nonzeros in time O(m'^'^n), instead of time 0(ii}'^) given 
by Lemma [3j 

The main idea is to decompose matrix dB into a (min, -f )-product of permutation- 
distribution matrices with small core. The decomposition is described in terms of 
density matrices, and proceeds recursively. In each recursive step, we define in- 
finite permutation matrices D'b, D'b, that are obtained from the density matrix 
Db as follows. 



Recall that non-trivial nonzeros in Db belong to the index pair range {—m : 
n) X (0 : rn + n) . Intuitively, the idea is to split the range of each index into two 
blocks: 

(-m : n) = {-m : f ) U (f : n) 

(0 : m + n> = (0 : f) U (f : m + n) 

Note that the splits are not uniform, and that among the resulting four index 
pair blocks in Db, the block : n) x (O : ^) cannot contain any nonzeros. We 
process the remaining three index pair blocks individually, gradually introducing 
nonzeros in matrices D'^, Dg until they become permutation matrices. Non- 
trivial nonzeros in D'^, D'^ will belong respectively to the index ranges m : 
f ) X (0 : m-H f) and (f : m-hn) X (m-H f : 2m + n). 

First, we consider all nonzeros in Db with indices (j, k) E m : f ) x (O : f ). 
For every such nonzero, we introduce a nonzero in D'g at index pair {j,k). We 
also consider all nonzeros in Db with indices (j, A:) g ( ^ : n) x ( ^ : m -I- n) . For 
every such nonzero, we introduce a nonzero in D'^ at index pair (m -I- j, m + k). 

Now consider all nonzeros in Db with indices (j, k) G (—to : f ) x : m+n). 
There are exactly to such nonzeros. Denote their index pairs by (jo, fco)i Oij ki), 
. . . , fcjn-i), where jo < ji < ■ ■ • < jm-i- For each nonzero with index 

pair {jt,kt), we introduce a nonzero in D'g at index pair (jt, ^^y^ -I- t), and a 
nonzero in Dg at index pair (-^^ + t,m+ kt). 

Finally, we introduce the trivial nonzeros in D'^, D'g at index pairs {j,k), 
k — j = m, outside the above non-trivial ranges. The recursive step is completed. 

Let d'g, dg be the distribution matrices of D'g, Dg. Let d*g — d'g dg, 
and d*(j ^ dA Q d*g ^ dA & d'g Q dg, and define D*g , D'^ accordingly. By the 
construction of the decomposition of ds, matrices Db and Dg (as well as 
and d*g) wee related by a simple shift: for all {i,k), i,k £ (— cx), -l-oo), we have 
DbU, k) — D*g{j, k + m). Consequently, matrices Dc and D'^ are related by a 
similar shift: for all (i, k), i,k £ (— oo, -l-oo), we have Dc{i, k) = D'^{i, k + m). 

The described decomposition process continues recursively, as long as n > to. 
The problem of computing matrix dc is thus reduced, up to an index shift, to 
n/m instances of multiplying permutation-distribution matrices. In every in- 
stance, one of the multiplied matrices has core of size 0{m). By Lemma |3] the 
non-trivial part of every such multiplication can be performed in time 0{m^'^) 
and memory 0{m). The trivial parts of all these multiplications can be combined 
into a single scan of the nonzero sets of Da, Db, and can therefore be performed 
in time and memory 0{M + n). Hence, the whole computation can be performed 
in time 0[M + {n/m) ■ to^-^) = 0{M + m^-^n) and memory 0{M + n). □ 

The decomposition of matrix Db the proof of Lemma |4] is illustrated by 
Figures |4] |5] The rectangle corresponding to Db is split into two half-sized rect- 
angles, corresponding to D'g and D'g. Each of the new rectangles is completed 
to a full-sized rectangle by trivial extension; then, the rectangles are arranged 
vertically with a shift by to. The "seaweed" curves that do not cross the par- 
tition are preserved by the construction, up to a shift by to. The "seaweed" 




Fig. 4. Proof of Lemma |4j the original matrix Db 




Fig. 5. Proof of Lemma |4j the decomposition of Db 



curves that cross the partition are also preserved up to a shift by to, by passing 
them through a parallelogram-shaped "buffer zone" . Note that this construction 
makes the latter class of curves uncrossed in D'g , and preserves all their original 



crossings in Dg. 



